o '■ Markov Chains of Infinite Order and Asymptotic 

> 
o , 

Satisfaction of Balance: Application to the Adaptive 
3 Integration Method 

ii 

O ■ David J. Earl and Michael W. Deem 

q 

CO 

• Sh ! Departments of Bioengineering and Physics & Astronomy 
>v 

C^; Rice University 

6100 Main Street— MS 142 

O 
in 

Houston, TX 77005-1892 

3 : February 9, 2008 

O 

• i— ( 

CO 

i-C . Corresponding author: M. W. Deem, mwdeem@rice.edu, fax: 713-348-5811. 

Oh' 



X 



1 



Abstract 

Adaptive Monte Carlo methods can be viewed as implementations of Markov chains 
with infinite memory. We derive a general condition for the convergence of a Monte 
Carlo method whose history dependence is contained within the simulated density 
distribution. In convergent cases, our result implies that the balance condition need 
only be satisfied asymptotically. As an example, we show that the adaptive integration 
method converges. 

1 Introduction 

Adaptive Monte Carlo methods that change the sampling strategy based upon statis- 
tics collected on the fly have been shown to be very powerful in a number of interesting 
applications. 1-5 Typically these adaptive methods use the statistics collected during a run 
to construct an importance sampling potential that is intended to remove the most signifi- 
cant barriers to sampling in the problem. These methods, however, have been criticized by 
various authors due to their lack of satisfying detailed balance. Although the use of adap- 
tive simulation methods is growing, and their success has been demonstrated in a number 
of cases, 4 widespread acceptance of the correctness of the approach within the simulation 
community has been hampered by these questions of detailed balance and so of convergence 
of the simulated distribution in phase space. In most of these methods, it is clear that there 
is at least one fixed point of the distribution, the Boltzmann, possibly modified by an impor- 
tance sampling factor that is itself a functional of the Boltzmann distribution. As the Monte 
Carlo algorithm will be started from an arbitrary initial condition, it is of interest to know 
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whether these algorithms will converge to the Boltzmann, or any other, fixed point. Such 
fixed point analysis has not been performed for this class of algorithms and is the subject of 
this paper. 

As an archetypal example of these adaptive Monte Carlo methods, we consider the 
adaptive integration scheme of Fasnacht, Swendsen, and Rosenberg. 6 In this method, we 
focus on one order parameter, A, of the system that leads to the most significant barriers. 
We construct an estimate of the probability distribution of A, and we use the inverse of this 
probability as the importance sampling function: 



where (3 is the inverse temperature, and x is a vector in 3n dimensions. We define 



P(A ) = (5(A(x) - A )> 



Jrfxe-^ x >(5(A(x) - Ap) 



(1) 




(2) 



Then the probability distribution is given by 



^(Ao) 



(3) 



Z 



where the configurational integral is Z = j rfxe /3C/ ( X ). Now, consider 




/dxe-^W^(A(x)-Ao) 



Jrfxe-^( x )5(A(x) - A ) 
Jcfarf(A(x)-A )^e-^W 
Jrfxe-/ 3f/ W ( 5(A(x) - A ) 
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-p 



/dxe-^Wffi(A(x)-A ) 
Jdxe-/ 3C/ ( x )5(A(x) - A, 



OJ 



where we have used the fact that both A and U are functions of x, and dll/dX really means 
VZ7 • dx/dA. Thus, we come to the thermodynamic integration formula for the free energy 
function of A: 



F(X ) = /,A(f) 

Ac Jdxe-^)fJ(A(x)-AQ 

A min Jdxe-^W5(A(x)-A') " 1 J 



We desire the observed distribution in the Monte Carlo scheme to be 

p(x) = (const)e- /3U(x) /P(A(x)) = (const) e - /3U(x)+/3F(x) . (6) 

As the simulation is performed, the estimated value of importance sampling free energy, 
F(A), is constructed from Eqn. |H1 and used in Eqn. El For example, if the distribution to 
date is p(x), then we have 



P(X 1 - [ X ° , y /^p(x)f£(A(x)-AQ m 
F(Ao) " A min dX /c /xp(x)5(A(x)-A') • (7) 



In the context of a Metropolis algorithm, the acceptance criterion would be 



acc 



n ) = m i n |l ? e -/3C/(x")+/3(/(x )+/3F(A(x"))-/3F(A(x°))| 



4 



Although we have written transition probabilities that satisfy detailed balance in Eq. |HJ 
our analysis is equally applicable to transition probabilities that satisfy only balance. 7 Since 
the importance sampling function is changing with step number, due to the updates to 
density and so to F, this adaptive algorithm does not satisfy detailed balance. It is clear, 
however, that if the density has converged to the Boltzmann, p(x) = (const)e _/3C/ ^ x - )+/3F ^ x - ) , 
then the estimation in Eqn. [7| is exact, and the acceptance criterion in Eqn. |H] is exact 
and constant for future steps. Thus, the desired exact answer, Eqn. El is a fixed point of 
this algorithm. We also note that if the observed density distribution is not exact, but if 
p(x)/p(y) = e _/3C/ * x ) +/3i7 ( y ) whenever A(x) = A(y), then the estimated importance sampling 
function, Eqn. is also exact. This property will prove to be useful. 

The Markov process underlying a Monte Carlo algorithm with acceptance probabili- 
ties such as Eqn. |HJ has memory, because the process depends on all past states through 
the observed density distribution p(x). Technically, this is a Markov chain of infinite order. 8 
Markov chains of infinite order have a solid body of convergence results when the dependence 
on the past, essentially, decays exponentially fast (technically, when they are continuous with 
respect to the past). 9 The Markov process in adaptive integration is dramatically discontinu- 
ous with respect to the past: the first point observed is just as important as the most recent 
point observed in the measured density distribution. 

We here consider the infinite order Markov process that occurs in adaptive Monte Carlo. 
In Sec. 121 we derive a general condition on the transition matrices for adaptive Monte Carlo 
that guaranties convergence to a unique distribution. Although we often use continuous 
notation, technically we limit discussion to finite, discrete spaces, in the belief that the 
continuous limit exists as the grid spacing goes to zero. In Sec. El we examine the special 
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case of adaptive integration Monte Carlo, showing convergence to a unique distribution 
occurs. We discuss our findings in Sec. EJ 



2 Theory 



We wish to find the conditions under which a Markov chain of infinite order converges to 
a unique limiting distribution. We consider the chain to possess a transition probability that 
depends on the current and future state and on the density constructed from all previously 
observed states, as in Eqns. EHE1 We assume that the transition probabilities satisfy a 
generalization of the irreducibility condition: we assume there is c > such that 



for all x, y, p, and t for some fixed m. This is a precise statement of our desire that the process 
be ergodic, with mixing time m. Thus, we have Prob[x(£)] > c for all times t > m, because 
we can apply Eqn. |U]to each of the initial states 1, . . . , m — 1, and then iterate to conclude 
Prob[x(m' + m)] = X) x ' Prob[x'(m'), p(w!) —>■ x(m + m')]Prob[x'(m')] > cX) x ' Prob[x'(m')] = 
c. In fact, we consider a larger value of m, so that any given Markov chain has an observed 
density that equals the expected probability distribution to 0(1/ '^/m). m Now consider a 
Markov chain that has run for M Monte Carlo steps, M 3> m. For this process, it will be 
true that 



Prob[x(t), p(t) -> y(t + m)]> c 



(9) 



p(M + m)= p(M) + 0(m/M) = p{M) . 



(10) 
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Thus, the transition matrix A[p(M)] will be roughly constant during this time interval, since 
the density itself is not changing much, and assuming the transition matrix depends smoothly 
on the density. The distribution of new states during this time period will converge to 

Prob[x(M + m)] = A[p(M + m - l))A[p(M + m - 2)) . . . A[p(M))p(M) 
= A[p{M)} m p{M)+0{m 2 /M) 

= p*[p(M)]+0(r m ,m 2 /M) . (11) 

Here r < 1 is the second largest generalized eigenvalue of A[p(M)]. The probability 
distribution is driven to the limiting distribution of the transition matrix for large m: 
Prob[x(M + m)] — > p*[p(M)]. By the Frobenius-Perron theorem, 11 this limiting distribution 
depends on the measured density, but not on the state at M: p* = linim^oo A m [p(M)]p(M) = 
linv^oo A m [p(M)]p for any p. By the central limit theorem of Markov processes, 10 any likely 
instance of this probability distribution will be accurate to 0(1/ y/m). Thus, the contribution 
of these m steps to the history-dependent density is 

"< M + m > = mT^ (m) + mT^>< m >1 + MT^°4- r "' ^ • < 12 » 

Since we consider the limit of 1 m 2 <ti M, we may drop the error terms. We let u = M ™ m - 
Then by the contraction mapping theorem on compact spaces 12 there will be a fixed point 
to Eqn. ^]if there is a metric, D, such that 

D [(1 - u)pi + up*[ Pl ], (1 - u)p 2 + up*[p 2 }} (13) 
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is initially decreasing as u increases from 0, for any arbitrary pi and p 2 - If this condition is 
satisfied, the fixed point exists and is unique for our finite, discrete system. 12 We note that 
if the following is satisfied for arbitrary p\ and P2, then Eqn. [T3]is automatically satisfied for 
small u: 

D[p*[ Pl lp*[p 2 }}<D[p h p 2 ] (14) 

Alternatively, we can consider the uniqueness and existence of the mapping p n+ i = p*(p n ) 
for arbitrary p . 



3 Application to Adaptive Monte Carlo 

The general condition, Eqn. El seems difficult to check for an arbitrary functional 
dependence on the measured density, p* [p] . We, thus, specialize consideration to the adaptive 
integration method. We rewrite Eqn. ^] as 



p(t + At) = (l-^)p+^p*{p(t)) (15) 



where At = m. Assuming that this difference equation is well-approximated by a differential 
equation, we find 

Tt At = —T + T p [p] (16) 

d i=~yi P \- P ) 

llP P*[P}-P (18) 



dliat 
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Letting a = hit, 



|W[p]-p (19) 



We note that for p < p*, p increases, whereas for p > p*, p decreases. Therefore p = p* 
informally appears to be a stable fixed point. 

We now consider more carefully the function p*[p\. Letting t 3> M, we find 



M , , . M N ^ p., 

p{t) = —(arbitrary initial p) + (1 - — ) ^ _ (20) 

1 1 i=M/m ^ m)/m 



where the density at timet = iAt, pi = p* = e~ u /Pi(X), is correct for agiven A, Pi(x)/pi(y) = 
e -/3C/(x)+/3c/(y) ^ £ or ^jch p^(A) has not converged to Eqn. This result for the ratio of 
the density follows from Eqns. IHHH1 Thus, for a given A, 



p(t)(x) = (const) exp-^W +0(— ) . (21) 



Eqns. |BHEl m the limit 1 <C m 2 « M « t thus imply that F and p are becoming exact: 



Pip] = P* + 5 -fh 
dp 

= p* + 0(M/t) . (22) 



Thus, Eqn. HH1 becomes 



^ = p*[p* + Sp] - p + 0(M/t,m 2 /t,r m ,l/^) 
da 

= p* -p + 0(Me- a ,mV a ,r m ,l/v^) , (23) 
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where we have reinserted the additional error terms from Eqn. Thus, 



p -> p* + 0(Me- a ,m 2 e- 



a 



,r m ,l/v^) 



(24) 



exponentially fast in a. Thus, in the limit 1 «C m 2 < M C we find 



p(t) -> p* + 0(M/t, m 2 /t, r m , 1/Vm, 1/t) 



(25) 



as t — ► oo, where the errors in Eqn.[2Hlare respectively from the error on F, the change of the 
density during At, the Monte Carlo convergence for a given F, the stochastic convergence 
of the density to the distribution, and the convergence of the differential equation. 

4 Discussion 

Our analysis of the convergence of the adaptive integration Monte Carlo scheme gives 
insight into why the convergence of the method is so rapid. That is, the adaptive method 
converges as fast as the underlying Monte Carlo method converges for density distributions 
with a given value of A. Once these estimates have converged, then the sampling over 
different values of A is typically exponentially accelerated by the importance sampling bias 
introduced by the P(X) factors in Eqn. |Ul so that a simple random walk in the A order 
parameter may be achieved without any barriers to sampling. Thus, if the most significant 
barriers in the problem are fully captured by the A order parameter, then the adaptive 
integration dramatically speeds up the convergence of the underlying Monte Carlo algorithm. 
In the context of the above analysis, an improved importance sampling estimate dramatically 
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reduces the number of steps, m, that it takes to reach equilibrium for a given value of A. 

In the above analysis, full convergence was shown. That is, convergence of both the 
density distribution for a given value of A and the distribution of A was shown. Since adap- 
tive integration Monte Carlo is a form of iteration, we see the linear convergence, as 0(l/t), 
in Eqn. EI3 In the analysis of the Monte Carlo data, we will remove the bias introduced 
by the P(X). That is, we will adjust the observed densities, scaling out the -P(A) depen- 
dence by histogram reweighting. Incidentally, we note from Eqn. |2Dl that when reweighting 
the histograms, one should use the the average of the calculated importance sampling fac- 
tors, (l/Pj(A)), rather than the instantaneous importance sampling factor, l/P t (X). Such a 
reweighting procedure implies by Eqns. EHZI that once the simulated density has converged 
within a given value of A (in 0(m) steps), the reweighted density has also converged for 
all A. So, the slow linear convergence observed in Eqn. ESI is actually not a great concern. 
Conversely, our major algorithmic interest is in the exponential reduction of the sampling 
time, m, within a given value of A, which is already induced by an only roughly converged 
importance sampling bias, -P(A). 

In conclusion, detailed balance, or balance, 7 need not be satisfied at all times in a 
Monte Carlo simulation. Balance need only be satisfied asymptotically. Indeed, the desire 
to maintain balance comes only from the Markov chain theory that shows such an approach 
converges to a unique limiting distribution. Any Monte Carlo method that converges to the 
specified unique limiting distribution will be equally valid. Given the success of the adaptive 
Monte Carlo methods, it would appear that the importance of detailed balance is over-rated. 
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